CommAI: Evaluating the first steps towards a useful general AI
نویسندگان
چکیده
With machine learning successfully applied to new daunting problems almost every day, general AI starts looking like an attainable goal (LeCun et al., 2015). However, most current research focuses instead on important but narrow applications, such as image classification or machine translation. We believe this to be largely due to the lack of objective ways to measure progress towards broad machine intelligence. In order to fill this gap, we propose here a set of concrete desiderata for general AI, together with a platform to test machines on how well they satisfy such desiderata, while keeping all further complexities to a minimum. 1 DESIDERATA FOR THE EVALUATION OF MACHINE INTELLIGENCE Rather than trying to define intelligence in abstract terms, we take a pragmatic approach: we would like to develop AIs that are useful for us. This naturally leads to the following desiderata. Communication through natural language An AI will be useful to us only if we are able to communicate with it: assigning it tasks, understanding the information it returns, and teaching it new skills. Since natural language is by far the easiest way for us to communicate, we require our useful AI to be endowed with basic linguistic abilities. The language the machine is exposed to in the testing environment will inevitably be very limited. However, given that we want the machine to also be a powerful, fast learner (see next point), humans should later be able to teach it more sophisticated language skills as they become important to instruct the machine in new domains. In concrete, the environment should not only expose the machine to a set of tasks, but provide instructions and feedback about the tasks in simple natural language. The machine should rely on this form of linguistic interaction to efficiently solve the tasks. Learning to learn Flexibility is a core requirement for a useful AI. As our needs change, the AI should help us with the new challenges we face: from solving a scientific problem in the morning at work to stocking our fridge at night. Progress towards AI should thus be measured on the ability to master a continuous flow of new tasks, with data-efficiency in solving new tasks as a fundamental evaluation component, and without distinguishing train and test phases. We must distinguish this learning to learn ability, pertaining to generalization across tasks (Ring, 1997; Schmidhuber, 2015; Silver et al., 2013; Thrun & Pratt, 1997), from 1-shot learning, that is, the challenging but more limited ability to generalize to new classes within the same task (e.g., extending an object classifier to recognize unseen objects from just a few examples; Lake et al., 2015). It’s generally agreed that, in order to generalize across tasks, a program should be capable of compositional learning, that is, of storing and re-combining solutions to sub-problems across tasks (Fodor & Lepore, 2002; Lake et al., 2016; Minsky, 1986). The testing environment should thus feature sets of related tasks, such that a compositional learner can bootstrap skills from one task to the other. Finally, mastering language skills might be a crucial component of learning to learn, since understanding linguistic instructions allows us to quickly learn how to accomplish tasks we have never performed before. Feedback As we grow up, we learn to master complex tasks with decreasing amounts of explicit reward. A useful AI should possess similar capabilities. Consequently, in our testing environment, ∗Now at DeepMind.
منابع مشابه
Evaluating the Knowledge and Attitudes of the Members of the Medical Community Mobilization on First Aid for Burn Injuries in Guilan, Iran
Background and purpose: Burn is one of the most devastating injuries which is considered as a critical health issue in the world. Rapid and effective post-burn first aid can significantly affect the burn outcomes. The purpose of this study was to evaluate the level of knowledge and attitude of the Medical Community Mobilization on burn injuries. Materials and methods: This cross-sectional stud...
متن کاملA Personalized Time Management Assistant: Research Directions
This paper presents ongoing work to build the Personalized Time Manager (PTIME) system, a persistent assistant that builds on our previous work on a personalized calendar agent (PCalM) (Berry et al. 2004). PCalM was an early test of the hypothesis that in order to persist and be useful, an intelligent agent must learn and adapt to the user’s changing needs. PTIME extends this idea to include mo...
متن کاملResponses of root growth and distribution of maize to nitrogen application patterns under partial root-zone irrigation
A field experiment was carried out to investigate the effects of varying nitrogen (N) supply andirrigation methods on the root growth and distribution of maize (Zea mays L.) in Wuwei,northwest China in 2011 and 2012. The irrigation treatments included alternate furrow irrigation(AI), fixed furrow irrigation (FI) and conventional furrow irrigation (CI). The N supply treatmentsincluded alternate ...
متن کاملActive logic and practice
The problem of nding a suitable formal approach to describe on-going reasoning process has been open since the very beginning of AI. In this paper we argue that active logic might be a formalism useful in this context. Active logic is rst introduced, then we analyse resource limitations that constrain the space of possible practical realisations of such reasoners. Finally some steps towards cre...
متن کاملOn Modeling, Evaluating and Increasing Players' Satisfaction Quantitatively: Steps towards a Taxonomy
This paper shows the results of a review about modeling, evaluating and increasing players’ satisfaction in computer games. The paper starts discussing the main stages of development of quantitative solutions, and then it tries to propose a taxonomy that represents the most common trends. In the first part of this paper we take as base some approaches that were already described in the literatu...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1701.08954 شماره
صفحات -
تاریخ انتشار 2017